Co-training applied in automatic term extraction

نویسنده

  • L. A. Ha
چکیده

This paper discusses the use of a setting similar to co-training in automatic terminology processing. Two aspects of terms (internal aspect, i.e linguistic, and statistical properties; and external aspect, i.e. contexts) will be used interchangeably in a bootstrapping manner, in order to extract more and more terms and context patterns. The results show that, using only a small set of seed terms, the method can extract terms, with higher success rates than those of other methods. Further more, this method can also discover interesting context patterns, which can be used in other terminology processing applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using machine learning to perform automatic term recognition

In this paper a machine learning approach is applied to Automatic Term Recognition (ATR). Similar approaches have been successfully used in Automatic Keyword Extraction (AKE). Using a dataset consisting of Swedish patent texts and validated terms belonging to these texts, unigrams and bigrams are extracted and annotated with linguistic and statistical feature values. Experiments using a varying...

متن کامل

Combining Optimal and Atomic Decomposition of Terminology Association graphs

We introduce novel approaches of graph decomposition based on optimal separators and atoms generated by minimal clique separators. The decomposition process is applied to co-word graphs extracted from Web Of Science database. Two types of graphs are considered: co-keyword graphs based on the human indexation of abstracts and terminology graphs based on semi-automatic term extraction from abstra...

متن کامل

Term Extraction and Mining of Term Relations from Unrestricted Texts in the Financial Domain

In this paper, we present an unsupervised hybrid textmining approach to automatic acquisition of domain relevant terms and their relations. We deploy the TFIDFbased term classification method to acquire domain relevant terms. Further, we apply two strategies in order to learn lexico-syntatic patterns which indicate paradigmatic and domain relevant syntagmatic relations between the extracted ter...

متن کامل

Automatic segmentation of glioma tumors from BraTS 2018 challenge dataset using a 2D U-Net network

Background: Glioma is the most common primary brain tumor, and early detection of tumors is important in the treatment planning for the patient. The precise segmentation of the tumor and intratumoral areas on the MRI by a radiologist is the first step in the diagnosis, which, in addition to the consuming time, can also receive different diagnoses from different physicians. The aim of this study...

متن کامل

Combining statistics on n-grams for automatic term recognition

This paper presents the work-in-progress in the development of an automatic term recognition (ATR) system built around the Corpus Cientı́fico-Técnico (CCT). Terms are modeled using three non-correlated dimensions: unithood, domainhood and usage, applied to a set of -grams automatically extracted from the corpus. These dimensions are combined with a supervised machine learning algorithm in order ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003